AITopics | ad generation

Collaborating Authors

ad generation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MCAD: Multimodal Context-Aware Audio Description Generation For Soccer

Chaudhary, Lipisha, Mittal, Trisha, Gopalakrishnan, Subhadra, Nwogu, Ifeoma, Pytlarz, Jaclyn

arXiv.org Artificial IntelligenceNov-13-2025

Abstract--Audio Descriptions (AD) are essential for making visual content accessible to individuals with visual impairments. Recent works have shown a promising step towards automating AD, but they have been limited to describing high-quality movie content using human-annotated ground truth AD in the process. In this work, we present an end-to-end pipeline, MCAD, that extends AD generation beyond movies to the domain of sports, with a focus on soccer games, without relying on ground truth AD. T o address the absence of domain-specific AD datasets, we fine-tune a Video Large Language Model on publicly available movie AD datasets so that it learns the narrative structure and conventions of AD. During inference, MCAD incorporates multimodal contextual cues such as player identities, soccer events/actions, and commentary from the game. These cues, combined with input prompts to the fine-tuned Video-LLM, allow the system to produce complete AD text for each video segment. We further introduce a new evaluation metric, ARGE-AD, designed to accurately assess the quality of generated AD. ARGE-AD evaluates the generated AD for the presence of five characteristics: (i) usage of people's names, (ii) mention of actions/events, (iii) appropriate length of AD, (iv) absence of pronouns, and (v) overlap from commentary/subtitles. We present an in-depth analysis of our approach on both movie and soccer datasets. We also validate the use of this metric to quantitatively comment on the quality of generated AD using our metric across domains. Additionally, we contribute audio descriptions for 100 soccer game clips annotated by two AD experts. Audio Description (AD) is the descriptive spoken narration of visual content, primarily for assisting visual impairments in accessing visual content [1].

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.09448

Country:

Europe > Spain (0.28)
North America > United States (0.28)
Europe > Germany (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies

Gao, Yingqiang, Fischer, Lukas, Lintner, Alexa, Ebling, Sarah

arXiv.org Artificial IntelligenceOct-11-2024

Audio descriptions (ADs) function as acoustic commentaries designed to assist blind persons and persons with visual impairments in accessing digital media content on television and in movies, among other settings. As an accessibility service typically provided by trained AD professionals, the generation of ADs demands significant human effort, making the process both time-consuming and costly. Recent advancements in natural language processing (NLP) and computer vision (CV), particularly in large language models (LLMs) and vision-language models (VLMs), have allowed for getting a step closer to automatic AD generation. This paper reviews the technologies pertinent to AD generation in the era of LLMs and VLMs: we discuss how state-of-the-art NLP and CV technologies can be applied to generate ADs and identify essential research directions for the future.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.0886

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > New York (0.04)
(9 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Television (0.46)
Health & Medicine > Therapeutic Area (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Zhang, Chaoyi, Lin, Kevin, Yang, Zhengyuan, Wang, Jianfeng, Li, Linjie, Lin, Chung-Ching, Liu, Zicheng, Wang, Lijuan

arXiv.org Artificial IntelligenceNov-29-2023

We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD). Unlike previous methods that primarily focused on downstream fine-tuning with short video clips, MM-Narrator excels in generating precise audio descriptions for videos of extensive lengths, even beyond hours, in an autoregressive manner. This capability is made possible by the proposed memory-augmented generation process, which effectively utilizes both the short-term textual context and long-term visual memory through an efficient register-and-recall mechanism. These contextual memories compile pertinent past information, including storylines and character identities, ensuring an accurate tracking and depicting of story-coherent and character-centric audio descriptions. Maintaining the training-free design of MM-Narrator, we further propose a complexity-based demonstration selection strategy to largely enhance its multi-step reasoning capability via few-shot multimodal in-context learning (MM-ICL). Experimental results on MAD-eval dataset demonstrate that MM-Narrator consistently outperforms both the existing fine-tuning-based approaches and LLM-based approaches in most scenarios, as measured by standard evaluation metrics. Additionally, we introduce the first segment-based evaluator for recurrent text generation. Empowered by GPT-4, this evaluator comprehensively reasons and marks AD generation performance in various extendable dimensions.

ad generation, mm-narrator, preprint arxiv, (15 more...)

arXiv.org Artificial Intelligence

2311.17435

Country: North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Master AI Tools List

#artificialintelligenceMar-4-2023, 23:25:55 GMT

The Master AI Tool List was created to share a comprehensive list of sites related to Artificial Intelligence and Machine Learning. We want to make this the source for discovering and sharing the latest AI tools. If you know of an AI Tool that is not currently listed, Use the submission form to request a review. Together we can make this an important resource for everyone. Ai Sofiya is a super Ai tool that can create social media ads in under a minute.

ad generation, master ai tool list, platform, (1 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback